Medical care in the United States cost Americans an average of approximately 8,500 per person per year, the equivalent of 18% of US GDP annually. This is by far the most of any country in the world, both as a dollar ammount and as a percentage of GDP. Most of this cost comes from emergency medical care and scheduled care at hospital facilities.
Although our health care purchases are arguably the most important ones we will make as consumers, there is remarkably little information on how much different medical procedures cost and what we are receiving in exchange for our money. Supposedly higher quality of care is more expensive, but is this actually true? We used R and Tableau to explore these disparities in the cost of care and their relationship to care quality.
source("RPlots/ConnectToOracle.R", echo = TRUE)
##
## > library(shiny)
##
## > library(shinyapps)
##
## Attaching package: 'shinyapps'
##
## The following object is masked from 'package:shiny':
##
## hr
##
## > library(rJava)
##
## > library(DBI)
##
## > library(RJDBC)
##
## > library(ggplot2)
##
## > options(java.parameters = "-Xmx2g")
##
## > jdbcDriver <- JDBC(driverClass = "oracle.jdbc.OracleDriver",
## + classPath = "~/ojdbc7.jar")
##
## > con <- dbConnect(jdbcDriver, "jdbc:oracle:thin:@128.83.138.158:1521/pdborcl",
## + "ds_medicare", "orcl")
We ran these querries to Import Data into R from the ds_medicare database.
source("RPlots/SQLQuerries.R", echo = TRUE)
##
## > HCAHPSMeasure = dbGetQuery(con, "Select * From Measures")
##
## > InpatientServices <- dbGetQuery(con, "Select * from InpatientServices")
##
## > Providers = dbGetQuery(con, "Select * from Providers")
##
## > OutpatientServices <- dbGetQuery(con, "Select * from OutpatientServices")
##
## > OutpatientVisits <- dbGetQuery(con, "select * from Outpatient WHERE ID BETWEEN 0 and 30000")
##
## > OutpatientVisits = rbind(OutpatientVisits, dbGetQuery(con,
## + "select * from Outpatient WHERE ID BETWEEN 30001 and 60000"))
##
## > OutpatientVisits = rbind(OutpatientVisits, dbGetQuery(con,
## + "select * from Outpatient WHERE ID BETWEEN 60001 and 90000"))
##
## > OutpatientVisits = rbind(OutpatientVisits, dbGetQuery(con,
## + "select * from Outpatient WHERE ID BETWEEN 90001 and 100000"))
##
## > InpatientVisits <- dbGetQuery(con, "select * from Inpatient WHERE ID BETWEEN 0 and 30000")
##
## > InpatientVisits = rbind(InpatientVisits, dbGetQuery(con,
## + "select * from Inpatient WHERE ID BETWEEN 30001 and 60000"))
##
## > InpatientVisits = rbind(InpatientVisits, dbGetQuery(con,
## + "select * from Inpatient WHERE ID BETWEEN 60001 and 90000"))
##
## > InpatientVisits = rbind(InpatientVisits, dbGetQuery(con,
## + "select * from Inpatient WHERE ID BETWEEN 90001 and 100000"))
##
## > InpatientVisits = rbind(InpatientVisits, dbGetQuery(con,
## + "select * from Inpatient WHERE ID BETWEEN 100001 and 130000"))
##
## > InpatientVisits = rbind(InpatientVisits, dbGetQuery(con,
## + "select * from Inpatient WHERE ID BETWEEN 130001 and 160000"))
##
## > outpatientCostByCity = dbGetQuery(con, "SELECT Providers.City as City, AVG(OutPatient.AverageSubmittedCharges) as AvgBilledCost \n ..." ... [TRUNCATED]
##
## > outpatientCostByState = dbGetQuery(con, "SELECT Providers.State as State, AVG(OutPatient.AverageSubmittedCharges) as AvgBilledCost \n ..." ... [TRUNCATED]
##
## > outpatientCostByHospital = dbGetQuery(con, "\n SELECT Providers.Name as Hospital, AVG(OutPatient.AverageSubmitt ..." ... [TRUNCATED]
##
## > outpatientCostByCity = dbGetQuery(con, "SELECT Providers.City as City, AVG(OutPatient.AverageSubmittedCharges) as AvgBilledCost \n ..." ... [TRUNCATED]
##
## > outpatientCostByState = dbGetQuery(con, "SELECT Providers.State as State, AVG(OutPatient.AverageSubmittedCharges) as AvgBilledCost \n ..." ... [TRUNCATED]
##
## > outpatientCostByHospital = dbGetQuery(con, "SELECT Providers.Name as Hospital, AVG(OutPatient.AverageSubmittedCharges) as AvgBilledCost \n ..." ... [TRUNCATED]
##
## > InpatientCostByCity = dbGetQuery(con, "SELECT Providers.City as City, AVG(InPatient.CoveredCharges) as AvgBilledCost \n ..." ... [TRUNCATED]
##
## > InpatientCostByState = dbGetQuery(con, "SELECT Providers.State as State, AVG(Inpatient.CoveredCharges) as AvgBilledCost \n ..." ... [TRUNCATED]
##
## > InpatientCostByHospital = dbGetQuery(con, "\n SELECT Providers.Name as Hospital, AVG(InPatient.CoveredCharges) a ..." ... [TRUNCATED]
##
## > PatientsRated9or10 = dbGetQuery(con, "\n Select Providers.Name, Reviews.AnswerPercent FROM Providers\n ..." ... [TRUNCATED]
##
## > PatientsRated9or10$ANSWERPERCENT <- as.numeric(PatientsRated9or10$ANSWERPERCENT)
##
## > CostVSRating = dbGetQuery(con, "\n Select Reviews.Answerpercent AS Rating, Reviews.SurveyID AS Question, \n ..." ... [TRUNCATED]
##
## > Rated9or10 = subset(CostVSRating, QUESTION == "H_HSP_RATING_9_10")
##
## > Rated7or8 = subset(CostVSRating, QUESTION == "H_HSP_RATING_7_8")
##
## > Rated0to6 = subset(CostVSRating, QUESTION == "H_HSP_RATING_0_6")
##
## > DefinitelyRecommend = subset(CostVSRating, QUESTION ==
## + "H_RECMND_DY")
##
## > ProbablyRecommend = subset(CostVSRating, QUESTION ==
## + "H_RECMND_PY")
##
## > NotRecommend = subset(CostVSRating, QUESTION == "H_RECMND_DN")
##
## > TexasQuery = subset(CostVSRating, STATE == "TX")
##
## > AustinQuery = subset(TexasQuery, REGION == "TX - Austin")
##
## > AverageCostBy910Rating <- aggregate(cbind(COST, INSUREDCOST) ~
## + REGION, Rated9or10, mean)
##
## > CheaperOutpatient = subset(AverageCostBy910Rating,
## + COST < 2000)
##
## > InpatientVisits$TOTALPAYMENTS <- as.numeric(InpatientVisits$TOTALPAYMENTS)
##
## > p <- subset(OutpatientVisits, APCID == 12)
##
## > p <- mean(p$AVERAGESUBMITTEDCHARGES)
##
## > TexasCostByProcedure <- aggregate(INSUREDCOST ~ REGION,
## + TexasQuery, mean)
source("RPlots/Plots2.R", echo = TRUE)
##
## > p3 <- hist(InpatientVisits$TOTALPAYMENTS, main = "Inpatient Procedure Cost",
## + xlab = "Average Ammount Billed Per Procedure", ylab = "# of Hosp ..." ... [TRUNCATED]
##
## > p4 <- hist(OutpatientVisits$AVERAGESUBMITTEDCHARGES,
## + main = "Outpatient Procedure Cost", xlab = "Average Amount Billed Per Procedure",
## + .... [TRUNCATED]
##
## > p5 <- hist(PatientsRated9or10$ANSWERPERCENT, main = "Patient Satisfaction \nRatings",
## + xlab = "Percent of Patients Who Rated \n Their Hospital ..." ... [TRUNCATED]
##
## > "\np1 <- ggplot(InpatientCostByState, aes(x = STATE, y = AVGBILLEDCOST)) + geom_point() + coord_flip()\np2 <- ggplot(outpatientCostByState, aes(x = ..." ... [TRUNCATED]
## [1] "\np1 <- ggplot(InpatientCostByState, aes(x = STATE, y = AVGBILLEDCOST)) + geom_point() + coord_flip()\np2 <- ggplot(outpatientCostByState, aes(x = STATE, y = AVGBILLEDCOST)) + geom_point() + coord_flip()\np7 <- ggplot(Rated9or10, aes(x = RATING, y = COST)) + geom_point() + facet_wrap(~PROCEDURE)\np8 <- ggplot(Rated9or10, aes(x = RATING, y = COST)) + geom_point() + facet_wrap(~STATE)\np10 <- ggplot(TexasCostByProcedure, aes(x = Description, y = UNINSUREDCOST)) + geom_point() + coord_flip()\np11 <- ggplot(TexasQuery, aes(x = RATING, y = COST)) + geom_point() + facet_wrap(~PROCEDURE)\np12 <- ggplot(AustinQuery, aes(x = RATING, y = COST)) + geom_point() + facet_wrap(~PROCEDURE)\n"
Even within Austin, there is a wide disparity in the cost of care, shown here for both insured and uninsured patients and by procedure:
We did find that higher cost was somewhat positively correlated with hospital volume (the number of patients seen):
And slightly more so nationally:
And although we found that cost also had a slight correlation with hospital quality locally:
That correlation dropped off at the national level:
We did see that insured cost was highly correlated with insured cost:
And that states with high costs for insured patients tended towards voting Democratic, with younger and sparser populations.
We also found that, possibly because of the high cost of medical care, there were a large number of people forgoing medical care because of cost.
And that the people who needed health care the most were the least likely to receive it.
They also tend to be poor, uninsured, and have poor mental health. !(Screenshots/ForgoIncome) !(Screenshots/ForgoInsured)
This was less true for those 65 and older, however, who were the least likely age group to forgo medical care because of cost.
In our research, we wanted to identify factors with high correlation to the existence of depression or other mental health disorders.
The dataset which was used included answers from a phone survey that took place in 2011-2012. Most questions related to behaviors resulting in leading causes of premature mortality and morbidity among adults. Over 139.000 responses were collected to 177 such questions.
18 attributes were chosen and extracted from the dataset as the basis of our examination in order to determine which factors have the highest correlation to mental disorders or depression. After all null values were eliminated from our selection, we had a remaining 108.544 rows.
| Attribute | Meaning |
|---|---|
| ADDEPEV2 | Ever told you had a depressive disorder? |
| AGE | Age |
| EDUCAG | Level of education (1-4) |
| EMPLOY | Employment Status (1-5) |
| EXERANY1 | Exercise in past 30 days |
| GENHLTH | Would you say that in general your health is? (1-5; Excellent-Poor) |
| HLTHPLN1 | Do you have any kind of health coverage? |
| INCOME2 | Income Level (1-8) |
| INCOMG | Computed Income Categories (1-5) |
| MEDCOST | Was there a time in the past 12 months, when you could not see a doctor because of the cost? |
| RFBING5 | Binge Drinking Calculated Variable |
| RFBMI5 | Overweight or obese calculated variable (BMI > 25.0) |
| RFDRHV4 | Heavy Alcohol Consumption Calculated Variable |
| RFHLTH | Adults with good or better health |
| RFSMOK3 | Adults who are currently smokers |
| SEX | Sex |
| VETERAN3 | Are you a veteran? |
Each feature created by NMF is a linear combination of the original attribute set. Each feature has a set of coefficients, which are a measure of the weight of each attribute on the feature. There is a separate coefficient for each numerical attribute and for each distinct value of each categorical attribute. The coefficients are all non-negative.